<html>

<head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  <title>Time Domain Audio Visual Speech Separation</title>
  <link rel="stylesheet" type="text/css" href="resources/stylesheet.css">
  <!--<link rel="shortcut icon" href="https://ai.tencent.com/ailab/images/favicon.ico">-->
</head>

<body>
  <div style="margin-right: auto;margin-left: auto; margin-top: 30px; margin-bottom: 100px; width: 70%;">
    <div style="text-align: center;">
      <h1>Time Domain Audio Visual Speech Separation</h1>
      <p><b>Authors:</b> Blind </p>
    </div>

    <p><b>Abstract:</b>
      Audio-visual multi-modal modeling has been demonstrated to be effective in many speech related tasks, such as
      speech recognition and speech enhancement. This paper introduces a new time-domain audio-visual architecture for
      target speaker extraction from monaural mixtures. The architecture generalizes the previous TasNet (time-domain
      speech separation network) to enable multi-modal learning and at meanwhile it extends the classical audio-visual
      speech separation from frequency-domain to time-domain. The main components of proposed architecture include an
      audio encoder, a video encoder that extracts lip embedding from video streams, a multi-modal separation network
      and an audio decoder. Experiments on simulated mixtures based on recently released LRS2 dataset show that our
      method can bring 3dB+ and 4dB+ Si-SNR improvements on two- and three-speaker cases respectively, compared to
      audio-only TasNet and frequency-domain audio-visual networks.
    </p>

    <h2>Network architecture:</h2>
    <!--<p style="text-align: center;"><img src="architecture.png" width="90%"></p>-->
    <div style="text-align: center;"><img src="resources/detailed-model.png" width="80%"></div>

    <h2>Audio samples from our simulated testing set based on LRS2<sup>[1]</sup></h2>
    <ol>
      <li>The proposed model used here was trained with multi-speaker technique and achieves 14.02 and 9.92 dB on two-
        and three-speaker test set, respectively.</li>
      <li>For uPIT-BLSTM<sup>[3]</sup>, we used three layer BLSTMs with dropout rate 0.5 and adopted PSM as training
        targets, which
        was proved better than IRM. 257-dimentional linear spectrogram (hop-size/window=10ms/hann) was extracted as
        input features.
      </li>
      <li>Conv-TasNet<sup>[2]</sup> was trained with a larger bottleneck size (B=384) in order to bring better
        performance, based on
        the best non-causal configurations in
        the original papar.</li>
      <li>The frequency domain audio-visual model we presented here is denoted as ConvFavsNet, which removes the audio
        encoder from the proposed time-domain network and replaces the decoder with a linear layer, transforming the
        output of the convolutional blocks to TF-masks. We used 321-dimentional spectrogram
        (hop-size/window=10ms/hann)
        as audio feature and the same lip embeddings as 1. </li>
    </ol>

    <h3>A. 2 speaker samples</h3>
    <p></p>
    <p></p>
    <table>
      <thead>
        <tr>
          <th>Mixture input</th>
          <th>BLSTM-uPIT</th>
          <th>Conv-TasNet</th>
          <th>Conv-FavsNet</th>
          <th>Proposed</th>
          <th>Ground Truth</th>
        </tr>
      </thead>
      <tbody>
        <tr style="border-top:1px solid black">
          <td rowspan="2"><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/mix/6306870852828565379-00004_6306811582279909115-00009_+0.02.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/upit/spk1/6306870852828565379-00004_6306811582279909115-00009_+0.02.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tas/spk1/6306870852828565379-00004_6306811582279909115-00009_+0.02.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/favs/spk1/6306870852828565379-00004_6306811582279909115-00009_+0.02.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tavs/spk1/6306870852828565379-00004_6306811582279909115-00009_+0.02.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/ref/spk1/6306870852828565379-00004_6306811582279909115-00009_+0.02.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/upit/spk2/6306870852828565379-00004_6306811582279909115-00009_+0.02.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tas/spk2/6306870852828565379-00004_6306811582279909115-00009_+0.02.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/favs/spk2/6306870852828565379-00004_6306811582279909115-00009_+0.02.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tavs/spk2/6306870852828565379-00004_6306811582279909115-00009_+0.02.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/ref/spk2/6306870852828565379-00004_6306811582279909115-00009_+0.02.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr style="border-top:1px solid black">
          <td rowspan="2"><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/mix/6326414672012357313-00085_6323956232732126743-00016_-1.87.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/upit/spk1/6326414672012357313-00085_6323956232732126743-00016_-1.87.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tas/spk1/6326414672012357313-00085_6323956232732126743-00016_-1.87.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/favs/spk1/6326414672012357313-00085_6323956232732126743-00016_-1.87.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tavs/spk1/6326414672012357313-00085_6323956232732126743-00016_-1.87.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/ref/spk1/6326414672012357313-00085_6323956232732126743-00016_-1.87.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/upit/spk2/6326414672012357313-00085_6323956232732126743-00016_-1.87.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tas/spk2/6326414672012357313-00085_6323956232732126743-00016_-1.87.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/favs/spk2/6326414672012357313-00085_6323956232732126743-00016_-1.87.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tavs/spk2/6326414672012357313-00085_6323956232732126743-00016_-1.87.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/ref/spk2/6326414672012357313-00085_6323956232732126743-00016_-1.87.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr style="border-top:1px solid black">
          <td rowspan="2"><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/mix/6348834401297406331-00002_6311014637275749987-00036_+1.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/upit/spk1/6348834401297406331-00002_6311014637275749987-00036_+1.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tas/spk1/6348834401297406331-00002_6311014637275749987-00036_+1.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/favs/spk1/6348834401297406331-00002_6311014637275749987-00036_+1.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tavs/spk1/6348834401297406331-00002_6311014637275749987-00036_+1.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/ref/spk1/6348834401297406331-00002_6311014637275749987-00036_+1.98.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/upit/spk2/6348834401297406331-00002_6311014637275749987-00036_+1.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tas/spk2/6348834401297406331-00002_6311014637275749987-00036_+1.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/favs/spk2/6348834401297406331-00002_6311014637275749987-00036_+1.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tavs/spk2/6348834401297406331-00002_6311014637275749987-00036_+1.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/ref/spk2/6348834401297406331-00002_6311014637275749987-00036_+1.98.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr style="border-top:1px solid black">
          <td rowspan="2"><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/mix/6360322579951237696-00006_6327713470253076639-00017_+3.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/upit/spk1/6360322579951237696-00006_6327713470253076639-00017_+3.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tas/spk1/6360322579951237696-00006_6327713470253076639-00017_+3.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/favs/spk1/6360322579951237696-00006_6327713470253076639-00017_+3.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tavs/spk1/6360322579951237696-00006_6327713470253076639-00017_+3.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/ref/spk1/6360322579951237696-00006_6327713470253076639-00017_+3.40.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/upit/spk2/6360322579951237696-00006_6327713470253076639-00017_+3.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tas/spk2/6360322579951237696-00006_6327713470253076639-00017_+3.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/favs/spk2/6360322579951237696-00006_6327713470253076639-00017_+3.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tavs/spk2/6360322579951237696-00006_6327713470253076639-00017_+3.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/ref/spk2/6360322579951237696-00006_6327713470253076639-00017_+3.40.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr style="border-top:1px solid black">
          <td rowspan="2"><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/mix/6383716407688597209-00035_6326553828952690654-00007_-4.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/upit/spk1/6383716407688597209-00035_6326553828952690654-00007_-4.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tas/spk1/6383716407688597209-00035_6326553828952690654-00007_-4.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/favs/spk1/6383716407688597209-00035_6326553828952690654-00007_-4.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tavs/spk1/6383716407688597209-00035_6326553828952690654-00007_-4.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/ref/spk1/6383716407688597209-00035_6326553828952690654-00007_-4.40.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/upit/spk2/6383716407688597209-00035_6326553828952690654-00007_-4.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tas/spk2/6383716407688597209-00035_6326553828952690654-00007_-4.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/favs/spk2/6383716407688597209-00035_6326553828952690654-00007_-4.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/tavs/spk2/6383716407688597209-00035_6326553828952690654-00007_-4.40.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source src="audio/2spk/ref/spk2/6383716407688597209-00035_6326553828952690654-00007_-4.40.wav"
                type="audio/wav"></audio></td>
        </tr>
      </tbody>
    </table>

    <h3>B. 3 speaker samples</h3>
    <p></p>
    <p></p>
    <table>
      <thead>
        <tr>
          <th>Mixture input</th>
          <th>BLSTM-uPIT</th>
          <th>Conv-TasNet</th>
          <th>Conv-FavsNet</th>
          <th>Proposed</th>
          <th>Ground Truth</th>
        </tr>
      </thead>
      <tbody>
        <tr style="border-top:1px solid black">
          <td rowspan="3"><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/mix/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk1/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk1/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk1/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk1/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk1/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk2/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk2/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk2/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk2/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk2/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk3/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk3/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk3/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk3/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk3/6300370419826092098-00001_6340299442417279404-00006_6329151425173313518-00006_-0.16_+3.46.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr style="border-top:1px solid black">
          <td rowspan="3"><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/mix/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk1/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk1/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk1/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk1/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk1/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk2/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk2/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk2/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk2/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk2/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk3/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk3/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk3/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk3/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk3/6323817075791736335-00056_6339758276407584924-00082_6306811582279909115-00014_+3.18_-3.56.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr style="border-top:1px solid black">
          <td rowspan="3"><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/mix/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk1/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk1/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk1/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk1/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk1/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk2/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk2/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk2/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk2/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk2/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk3/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk3/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk3/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk3/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk3/6328393792942295524-00005_6324764116080445123-00001_6334563083966338309-00106_+0.26_+1.37.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr style="border-top:1px solid black">
          <td rowspan="3"><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/mix/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk1/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk1/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk1/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk1/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk1/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk2/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk2/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk2/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk2/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk2/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk3/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk3/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk3/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk3/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk3/6352157417494386368-00051_6316024287129869379-00016_6325894121976022773-00008_+0.42_-0.13.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr style="border-top:1px solid black">
          <td rowspan="3"><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/mix/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk1/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk1/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk1/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk1/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk1/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk2/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk2/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk2/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk2/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk2/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
        </tr>
        <tr>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/upit/spk3/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tas/spk3/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/favs/spk3/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/tavs/spk3/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
          <td><audio controls class="audio-player" preload="metadata" style="width: 180px;">
              <source
                src="audio/3spk/ref/spk3/6360322579951237696-00004_6329151425173313518-00062_6309468449049188125-00024_-2.96_+0.98.wav"
                type="audio/wav"></audio></td>
        </tr>
      </tbody>
    </table>

    <h2>Part of comparison results</h2>
    <div style="text-align: center;"><img src="resources/comparison.png" width="60%"></div>

    <h2>Reference</h2>
    <p></p>
    <p>[1]. Noda K, Yamaguchi Y, Nakadai K, et al. Audio-visual speech recognition using deep learning[J]. Applied
      Intelligence, 2015, 42(4): 722-737.</p>
    <p>[2]. Luo Y, Mesgarani N. TasNet: Surpassing ideal time-frequency masking for speech separation[J]. arXiv
      preprint arXiv:1809.07454, 2018.</p>
    <p>[3]. Kolbak M, Yu D, Tan Z H, et al. Multitalker speech separation with utterance-level permutation invariant
      training of deep recurrent neural networks[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing
      (TASLP), 2017, 25(10): 1901-1913.</p>
  </div>

</body>

</html>